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Abstract 

\ The basic goal in combinatorial group testing is to identify a set of up to d defective items 

within a large population of size n ^ d using a pooling strategy. Namely, the items can be 
grouped together in pools, and a single measurement would reveal whether there are one or 
more defectives in the pool. The threshold model is a generalization of this idea where a mea- 
surement returns positive if the number of defectives in the pool exceeds a fixed threshold 
u, negative if this number is below a fixed lower threshold i < u, and may behave arbitrar- 
ily otherwise. We study non-adaptive threshold group testing (in a possibly noisy setting) 
and show that, for this problem, 0(c?^"'"^(log d) log(n/(i)) measurements (where g := u — i and 
u is any fixed constant) suffice to identify the defectives, and also present almost matching 
lower bounds. This significantly improves the previously known (non-constructive) upper bound 
c/3 I 0{d'^~^^ log{n/d)). Moreover, we obtain a framework for explicit construction of measurement 

schemes using lossless condensers. The number of measurements resulting from this scheme 
is ideally bounded by 0(rf^+''(log(i) logn). Using state-of-the-art constructions of lossless con- 
04 ■ densers, however, we come up with explicit testing schemes with 0(df"'"'^(logd)quasipoly(logn)) 

I and 0((if+'^+'^poly(logn)) measurements, for arbitrary constant (3 > 0. 

^ ! 1 Introduction 

. Combinatorial group testing is a classical problem that deals with identification of sparse Boolean 

O ! vectors using disjunctive queries. Suppose that among a large set of n items it is suspected that, for 

some sparsity parameter d <^ n, up to d items might be "defective". In technical terms, defective 
items are known as positives and the rest are called negatives. In a pooling strategy, the items may 
^ , be arbitrarily grouped in pools, and a single "measurement" reveals whether there is one or more 

^ ' positives within the chosen pool. The basic goal in group testing to design the pools in such a way 

that the set of positives can be identified from a number of measurements that is substantially less 
than n. 

Since its introduction in 1940's [15j, group testing and its variations have been extensively 
studied and have found surprisingly many applications in seemingly unrelated areas. In particular, 
we mention applications in molecular biology and DNA library screening (cf. [3| ll9l[27tl29ll33ll38t 
[39] and the references therein), multi-access communication [37], data compression [23], pattern 
matching [TTj, streaming algorithms [l2], software testing [2], compressed sensing [13], and secure 
key distribution [5j, among others. We refer the reader to [16l[T7] for an extensive review of the 
major results in this area. 
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Formally, in classical group testing one aims to learn an unknown d-sparsq^ Boolean vector 
{xi, . . . ,Xn) S {0,1}" using a set of m measurements, where each measurement is defined by a 
subset of the coordinates I C [n] and outputs the logical "or" Viex^*- '^^^ Soal is then to design 
the measurements in such a way that all d sparse vectors become uniquely identifiable using as few 
number of measurements as possible. 

A natural generalization of classical group testing (that we call threshold testing), introduced by 
Damaschke [14j , considers the case where the measurement outcomes are determined by a threshold 
predicate instead of the logical or. Namely, this model is characterized by two integer parameters 
i,u such that < i < u (that are considered fixed constants), and each measurement outputs 
positive if the number of positives within the corresponding pool is at least u. On the other hand, 
if the number of positives is strictly less thaiH £, the test returns negative, and otherwise the 
outcome can be arbitrary. In this view, classical group testing corresponds to the special case 
where £ = u = 1. In addition to being of theoretical interest, the threshold model is interesting 
for applications, in particular in biology, where the measurements have reduced or unpredictable 
sensitivity or may depend on various factors that must be simultaneously present in the sample to 
result in a positive outcome. 

The difference g := u — i between the thresholds is known as the gap parameter. As shown 
by Damaschke [13], in threshold group testing identification of the set of positives is only possible 
when the number of positives is at least u. Moreover, regardless of the number of measurements, in 
general the set of positives can be only approximately identified within up to g false positives and 
g false negatives (thus, unique identification can only be guaranteed when £ = u). Additionally, 
Damaschke constructed a scheme for identification of the positives in the threshold model. For the 
gap-free case where g = 0, the number of measurements in this scheme is 0{dlog n), which is nearly 
optimal (within constant factors). However, when > 0, the number of measurements becomes 
0{dvP + d"), for an arbitrary constant 6 > 0, if up to 5 + (n — l)/6 misclassifications are allowed. 

A drawback of the scheme presented by Damaschke is that the measurements are adaptive] i.e., 
the group chosen by each measurement can depend on the outcomes of the previous ones. For 
numerous applications (in particular, in molecular biology), adaptive measurements are infeasible 
and must be avoided. In a non-adaptive setting, all measurements must be specified before their 
outcomes are revealed. This makes it convenient to think of the measurements in a matrix form. 
Specifically, a non-adaptive measurement matrix is an m x n Boolean matrix whose ith row is the 
characteristic vector of the set of items participating in the ith pool, and the goal would be to 
design a suitable measurement matrix. 

More recently, non-adaptive threshold testing has been considered by Chen and Pu [6]. They 
observe that a generalization of the standard notion of disjunct matrices, the latter being extensively 
used in the literature of classical group testing, is suitable for the threshold model. Throughout this 
work, we refer to this generalized notion as strongly disjunct matrices and to the standard notion as 
classical disjunct matrices. Using strongly disjunct matrices, they show that 0{ed^^^ log(n/d)) non- 
adaptive measurements suffices to identify the set of positives (within g false positives/negatives) 
even if up to e erroneous measurements are allowed in the model. This number of measurements 
almost matches (up to constant factors) the known lower bounds on the number of rows of strongly 
disjunct matrices. However, the dependence on the sparsity parameter is d^^^, which might be 
prohibitive for an interesting range of parameters, when the thresholds are not too small (e.g., 
£ = u = 10) and the sparsity parameter is rather large (e.g., d = v}/^'^). 

vector is d-sparse if its support has size at most d. 
^ The original paper of Damaschke [14] uses a slightly different convention where the test returns negative if the 
number of positives in the group is at most I. Accordingly he defines the gap between the thresholds to be u — — 1, 
as opposed to m — ^ in our paper. 
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In this work, we consider the non-adaptive threshold model in a possibly noisy setting, where 
a number of measurement outcomes (specified by an error parameter e > 0) may be incorrect. 
Our first observation is that, a new variation of classical disjunct matrices (that is in general 
strictly weaker than strongly disjunct matrices) suffices for the purpose of threshold group test- 
ing. Moreover, we show that this weaker notion is necessary as well, and thus, precisely captures 
the combinatorial structure of the non-adaptive threshold model. Using a randomness-efficient 
probabilistic construction (that requires poly (d, log n) bits of randomness), we construct general- 
ized disjunct matrices with 0{d^~^'^ {log d) log{n/d)) rows. Thus, we bring the exponent of d in the 
asymptotic number of measurements from u + 1 (that is optimal for strongly disjunct matrices) 
down to 5 + 2, which is independent of the actual choice of the thresholds and only depends on the 
gap between them. We also show that this tradeoff is essentially optimal. 

We proceed to define a new auxiliary object, namely the notion of regular matrices, that turns 
out to be the key combinatorial object in our explicit constructions. Intuitively, given a gap g >0, 
a suitable regular matrix Mi can be used to take any measurement matrix M2 designed for the 
threshold model with lower threshold i = 1 and higher threshold u = g + 1 and "life" it up to 
matrix that works for any arbitrary lower threshold i' > 1 and the same gap g. Therefore, for 
instance, in order to address the gap-free model, it would suffice to have a non-adaptive scheme 
for the classical group testing model with i = u = 1. This transformation is accomplished using 
a simple product that increases the height of the original matrix M2 by a multiplicative factor 
equal to the height of the regular matrix Mi, while preserving the "low-threshold" distinguishing 
properties of the original matrix M2. 

Next, we introduce a framework for construction of regular matrices using strong lossless con- 
densers that are fundamental objects in derandomization theory, and more generally, theoretical 
computer science. We show that, by using an optimal condenser, it is possible to construct regular 
matrices with only 0((i(log d) log n) rows. This almost matches the upper bound achieved by a 
probabilistic construction that we also present in this work. To this date, no explicit construction 
of such optimal lossless condensers is known (though probabilistic constructions are easy to come 
up with). However, using state of the art in explicit condensers [4l[22], we will come up with 
two explicit constructions of regular matrices with incomparable parameters. Namely, one with 
0((i(log d)quasipoly(log n)) rows and another with 0((i^^'^poly(log n)), where /3 > is any arbitrary 
constant and the exponent of the term poly(log n) depends on the choice of /3. By combining regular 
matrices with strongly disjunct ones (designed for the lowered thresholds i' = 1 and u' = g + 1), 
we obtain our threshold testing schemes. The bounds obtained by our final schemes are summa- 
rized in Table [TJ When the lower threshold £ is not too small, our explicit constructions (rows M8 
and M9 of Table [I]) significantly improve what was previously known to be achievable even using 
non-constructive proofs. 

The rest of the paper is organized as follows. In Section 11.11 we introduce preliminary notions 
and fix some notation. Section [2] reviews the notion of strongly disjunct matrices and introduces 
our weaker notion (for the gap- free case 5 = 0), in addition to the notion of regular matrices and its 
properties. In Section [3] we develop our construction of regular matrices from lossless condensers, 
and instantiate the parameters in Section 13.11 This in particular leads to our explicit threshold 
testing schemes. Finally in Section H] we consider the case with nonzero gap, and in Section [5] we 
discuss the future directions. 

1.1 Preliminaries 

For a matrix M, we denote by M[i,j] the entry of M at the ith row and the jth column. Similarly, 
we denote the ith entry of a vector v by v{i). The support a vector x G {0, 1}", denoted by supp(x). 
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Table 1: Summary of the parameters achieved by various threshold testing schemes. The noise 
parameter p £ [0,1) is arbitrary, and thresholds i,u = i + g are fixed constants. "Exp" and 
"Rnd" respectively indicate explicit and randomized constructions. "KS" refers to the construction 
of strongly disjunct matrices based on Kautz-Singleton superimposed codes [23], as described in 
Appendix [A] (the bounds in rows M1-M5 are obtained by strongly disjunct matrices). 





AT 1 r 

IN umber of rows 


lolerable errors 


Remarks 


Ml 
M2 
M3 
M4 

M5 


0(('^/)"+') 


^ipd'-^) 
^{pd'-^) 
^ipd'-^) 

^{p{'-^m 


Rnd: Random strongly disjunct matrices. 
Exp: KS using codes on the GV bound. 
Exp: KS using Reed-Solomon codes. 

Exp: KS using Algebraic Geometric 
codes. 

Exp: KS using Hermitian codes {d ^ 
Vlogn). 


M6 
M7 

M8 
M9 


0(^3+3(logd)log^n^ 
0(^,+3(logd)T.^logn) 


^ipd'f^) 

^ipd'^) 
^{pd''^'-^) 


Rnd: Construction [2j 

Constructions H] and [1] combined, assum- 
ing optimal condensers and strongly dis- 
junct matrices. 

Exp: Constructions H] and [T] combined 
using Theorem [10] and M2, where T2 = 
exp(0(log^ logn)) = quasipoly(log n). 

Exp: Constructions [1] and [1] combined 
using Theorem [TT] and M2, where (3 > 
is any arbitrary constant and T3 = 
((logn)(logd))^+"/^ = poly(logn,logd). 




Q{d9+'^ logan + ed9+^) 


e 


Lower bound (see Section [1]). 
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is a subset of [n] := {1, . . . , n} such that i G supp(x) if and only if x{i) = 1. The Hamming weight 
of X, denote by wgt(x) is defined as |supp(a:)|. For an m x n Boolean matrix M and S C [n], we 
denote by M|5 the m x l^l submatrix of M formed by restricting M to the columns picked by 
S. Moreover, for a vector x G {0, 1}", we use M[x]i^u to denote the set of vectors in {0, 1}™ that 
correctly encode the measurement outcomes resulting from non-adaptive threshold tests defined by 
the measurement matrix M on a; using threshold parameters i, u. In the gap-free case, this set may 
only have a single element that we denote by M[x]„. Thus, for any y £ M[x\^^u we have y{i) = 1 
if |supp(Mj) n supp(x)| > u, and y{i) = if |supp(Mj) n supp(a;)| < where Mj indicates the jth 
row of M. 

The min-entropy of a distribution X with finite support f2 is given by 



where X{x) is the probability that X assigns to the outcome x and logarithm is taken to base 
2. A flat distribution is one that is uniform on its support. For such a distribution X, we have 
Hoo{X) = log(|supp(Af)|). The statistical distance between two distributions X and y defined on 
the same finite space O is given by ^ "^seQ ~ which is half the £i distance of the two 

distributions when regarded as vectors of probabilities over Q. Two distributions X and y arc said 
to be e-close if their statistical distance is at most e. We will use the shorthand Un for the uniform 
distribution on {0, 1}", and X ~ for a random variable X drawn from a distribution X. 

The main technical tool that we use in our explicit constructions is the notion of lossless 
condensers, defined below. 

Definition 1. A function /: {0, 1}" x {0, 1}* — > {0, 1}^ is a strong lossless condenser for entropy 
k and with error e (in short, (fc, e)-condenser) if for every distribution X on {0,1}" with min- 
entropy at least k, random variable X X and a seed Y ^ Ut, the distribution of {Y, f{X, Y)) is 
e-close to some distribution {Ut, Z) with min-entropy at least i-\-k. K condenser is explicit if it is 
polynomial-time computable. 

We will use the following "almost-injectivity" property of lossless condensers in our proofs. 

Proposition 2. Let X he a flat distribution with min-entropy logK over a finite sample space O 
and /: J7 — >■ r 6e a mapping to a finite set F. If f{X) is e-close to having min-entropy logK, then 
there is a set T C.V of size at least {1 — 2e)K such that 



Proof. Suppose that X is uniformly supported on a set C J7 of size K. For each y G F, define 
Uy := \{x G ri: f(x) = Denote by /x the distribution f{X) over F and by /x' a distribution on 
F with min-entropy K that is e-close to /x, which is guaranteed to exist by the assumption. Define 
T := {y G F: n^^ = 1}, and similarly, T' := {y : Uy > 2}. Observe that for each y G F we have 
= Ui/K, and also supp(/x) = TU T'. Thus, 



H^{X) :=min{-logA'(x)}, 



(Vy G T) /(x) = y A f{x') = y ^ x = x' . 




(1) 



The fact that ji and jJ are e-close implies that 
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In particular, this means that \T'\ < eK (since by the choice of T', for each y £ T' we have Uy > 2). 
Furthermore, 

^{uy-l) <eK ^ J2'n'y<eK+ \T'\ < 2eK. 

y&T' y&T' 

This combined with ([T]) gives 

\T\=K-^ny> {l-2e)K 

y&T' 

as desired. □ 

2 Variations of disjunct matrices 

The combinatorial structure used by Chen and Fu in their non-adaptive scheme is the following 
generalization of the standard notion of disjunct matrices that we refer to as strongly disjunct 
matrices throughout this work. 

Definition 3. A matrix (with at least d-\-u columns) is said to be strongly (d, e; n)-disjunct if for 
every choice of d + u columns Ci, . . . , C^, C(, . . . , C^, all distinct, we have 

|ntisupp(a)\utisupp(q)| >e. 

Observe that, (d, e; u)-disjunct matrices are, in particular, (d', e'; n')-disjunct for any d' < d, 
e' < e, and u' < u. Moreover, classical {d, e)-disjunct matrices that are extensively used in group 
testing literature (see [Ml Ch. 7]) correspond to the special case u = 1. 

To make the main ideas more transparent, until Section S] we will focus on the gap-free case 
where i = u. The extension to nonzero gaps is straightforward and will be discussed in Section [H 
Moreover, often we will implicitly assume that the Hamming weight of the Boolean vector that is 
to be identified is at least u (since otherwise, we know that confusions cannot be avoided), and will 
take the thresholds i, u as fixed constants. 

The notion of strongly disjunct matrices, in its general form, has been studied in the literature 
under different names and equivalent formulations, e.g., superimposed (u, (i)-designs/codes and 
{u, d) cover-free families (see [5|[71[T8 l [25 t l34 U 35j and the references therein). An important motivation 
for the study of this notion is the following hidden hypergraph-learning problem (cf. |16[ Ch. 12]), 
itself being motivated by the so-called complex model in computational biology [5]: Suppose that G 
is a ti-hypergraplJl on a vertex set V of size n, and denote by V(G) the set of vertices induced by the 
hyper-edge set of G; i.e., v G V(G) if and only if G has a hyper-edge incident to v. Then assuming 
that |V(Gr)| < d for a sparsity parameter d, the aim is to learn G using as few (non-adaptive) 
queries of the following type as possible: Each query specifies a set Q C V, and its corresponding 
answer is a Boolean value which is 1 if and only if G has a hyperedge contained in Q. It is known 
that [5l|20], in the hypergraph-learning problem, any suitable grouping strategy defines a strongly 
disjunct matrix (whose rows are characteristic vectors of individual queries Q), and conversely, any 
strongly disjunct matrix can be used as the incidence matrix of the set of queries. The parameter 
e determines "noise tolerance" of the measurement scheme. Namely, a strongly (d, e; ti)-disjunct 
matrix can uniquely distinguish between d-sparse hypergraphs even in presence of up to [e/2j 
erroneous query outcomes. 

The key observation made by Chen and Fu ^ is that threshold group testing corresponds to 
the special case of the hypergraph learning problem where the hidden graph G is known to be a 

^That is, a hypergraph where each edge is a set of u vertices. 
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u-cliquc|. In this case, the unknown Boolean vector in the corresponding threshold testing problem 
would be the characteristic vector of V(G). It follows that strongly disjunct matrices are suitable 
choices for the measurement matrices in threshold group testing. 

Nonconstructively, a probabilistic argument akin to the standard argument for the case of 
classical disjunct matrices (see [TH Ch. 7]) can be used to show that strongly (d, e; ti)-disjunct 
matrices exist withm = 0{d^^^{log{n/d))/{l—p)'^) rows and error tolerance e = Q,{pdlog{n/d)/{l— 
p)"^), for any noise parameter p G [0,1). On the negative side, however, several concrete lower 
bounds are known for the number of rows of such matrices [18,34,35J. In asymptotic terms, these 
results show that one must have m = log^n + ed"), and thus, the probabilistic upper bound 

is essentially optimal. 

For the underlying strongly disjunct matrix, Chen and Fu [6j use a greedy construction |7j that 
achieves, for any e > 0, 0((e + 1)^""*"^ log(n/(i)) rows, but may take exponential time in the size 
of the resulting matrix. Nevertheless, as observed by several researchers [5l[l8l|20l|25], a classical 
explicit construction of combinatorial designs due to Kautz and Singleton |24] can be extended to 
construct strongly disjunct matrices. This concatenation-based construction transforms any error- 
correcting code having large distance into a disjunct matrix. While the original construction uses 
Reed-Solomon codes and achieves nice bounds, it is possible to use other families of codes. In 
particular, as recently shown by Porat and Rothschild [30], codes on the Gilbert-Varshamov bound 
(cf. |28j ) result in nearly optimal disjunct matrices. Moreover, for a suitable range of parameters, 
they give a deterministic construction of such codes that runs in polynomial time in the size of 
the resulting disjunct matrix (albeit exponential in the dimension of the cod^. We will elaborate 
on details of this class of constructions in Appendix El and will additionally consider a family of 
algebraic-geometric codes and Hermitian codes which give incomparable bounds, as summarized in 
Table □(rows M2~M5). 

Even though, as discussed above, the general notion of strongly (d, e; ti)-disjunct matrices is 
sufficient for threshold group testing with upper threshold u, in this section we show that a new, 
weaker, notion of disjunct matrices defined below (which, as we show later, turns out to be strictly 
weaker when u > 1), would also suffice. We also define an auxiliary notion of regular matrices. 

Definition 4. A Boolean matrix M with n columns is called (d, e; u)-regular if for every subset 
of columns S C [n] (called the critical set) and every Z C [n] (called the zero set) such that 
u < 15"! < d, \Z\ < \S\, S f] Z = 0, there are more than e rows of M at which M\s has weight 
exactly u and (at the same rows) M\z has weight zero. Any such row is said to u-satisfy S and 
Z. If, in addition, for every distinguished column i £ S, more than e rows of M both ti-satisfy S 
and Z and have a 1 at the ith column, the matrix is called (d, e; u)-disjunct (and the corresponding 
"good" rows are said to u-satisfy i, S, and Z). 

To distinguish between the above variant of disjunct matrices and strongly disjunct matrices 
or classical disjunct matrices, we will refer to our variant as threshold disjunct matrices, or (when 
there is no risk of confusion) simply disjunct matrices. 

It is easy to verify that (assuming 2d < n) the classical notion of {2d — 1, e)-disjunct matri- 
ces is equivalent to strongly {2d — 1, e; l)-disjunct and (d, e; l)-disjunct. Moreover, any {d,e;u)- 
disjunct matrix is (d, e; ti)-regular, (d — l,e;u — l)-regular, and classical (d, e)-disjunct (but the 
reverse implications do not in general hold). Therefore, the above-mentioned lower bound of 

u-clique on the vertex set V is a u-hypergraph {V, E) such that, for some V' C V", i? is the set of all subsets 
of V' of size u. 

^In this regard, this construction of disjunct matrices can be considered weakly explicit in that, contrary to fully 
explicit constructions, it is not clear if each individual entry of the matrix can be computed in time poly(d, log n). 
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m = Q{d? log^n + ed) that applies for (d, e)-disjunct matrices holds for (d, e; n)-disjunct matri- 
ces as well. Below we show that our notion of disjunct matrices is necessary and sufficient for the 
purpose of threshold group testing: 

Lemma 5. Let M be an m x n Boolean matrix that is {d,e;u)- disjunct. Then for every distinct 
d-sparse vectors x,x' G {0,1}" such tha^ supp(x) ^ supp(x'), wgt(3;) > |supp(x') \ supp(2;)| and 
wgt(x) > u, we have 

|supp(M[x]„) \ supp(M[x']„)| > e. (2) 

Conversely, if M satisfies ([2]) for every choice of x and x' as above, it must be {[d / 2 \,e;u)- disjunct. 

Proof. First, suppose that M is (d, e; u)-disjunct, and let y := M[2;]u and y' := M[x']u. Take any 
i £ supp(2;) \ supp(x'), and let S := supp(x) and Z := supp(2;') \ supp(x). Note that \S\ < d and by 
assumption, we have \Z\ < \S\. Now, Definition [4] implies that there is a set E of more than e rows 
of M that n-satisfy i as the distinguished column, S as the critical set and Z as the zero set. Thus 
for every j E E, the jth row of M restricted to the columns chosen by supp(x) must have weight 
exactly u, while its weight on supp(x') is less than u. Therefore, y{j) = 1 and y'{j) = for more 
than e choices of j. 

For the converse, consider any choice of a distinguished column i G [n], a critical set S C [n] 
containing i (such that 15*1 >u), and a zero set Z C [n] where \Z\ < \S\. Define d-sparse Boolean 
vectors x,x' G {0,1}'^ so that supp(x) := S and supp(x') := Z U {S \ {i}). Let y := M[a;]M and 
y' := M[x']u and E := supp(y) \ supp(y'). By assumption we know that \E\ > e. Take any j E E. 
Since y{j) = 1 and y'{j) = 0, we get that the j'th row of M restricted to the columns picked by 
ZL) {S\{i}) must have weight at most u — 1, whereas it must have weight at least u when restricted 
to S. As the sets {i}, S \ {i}, and Z are disjoint, this can hold only if M[j, i] = 1, and moreover, the 
jth row of M restricted to the columns picked by S (resp., Z) has weight exactly u (resp., zero). 
Hence, this row (as well as all the rows of M picked by E) must u-satisfy i,S, and Z, confirming 
that M is ( [(i/2j , e; n)-disjunct. □ 

We will use regular matrices as intermediate building blocks in our constructions of disjunct 
matrices to follow. The connection with disjunct matrices is made apparent through a direct 
product of matrices defined in Construction [H Intuitively, using this product, regular matrices can 
be used to transform any measurement matrix suitable for the standard group testing model to one 
with comparable properties in the threshold model. The following lemma formalizes the idea. 

Lemma 6. Let Mi and M2 be Boolean matrices with n columns, such that Mi is {d — l,ei;u — 1)- 
regular. Let M := Mi M2, and suppose that for d-sparse Boolean vectors x, x' G {0, 1}" such that 

® Observe that at least one of the two possible orderings of any two distinct d-sparse vectors, at least one having 
weight It or more, satisfies this condition. 



• Given: Boolean matrices Mi and M2 that are nii x n and m2 x n, respectively. 

• Output: An m x n Boolean matrix Mi M2, where m := mim2. 

• Construction: Let the rows of M := Mi M2 be indexed by the set [mi] x [771,2]. Then the 
row corresponding to (i, j) is defined as the bit- wise or of the ith row of Mi and the jth row 
of M2. 



Construction 1: Direct product of measurement matrices. 
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wgt(x) > wgt(x'), we have 

|supp(M2[a;]i) \ supp(M2[x']i)| > 63. 
Then, |supp(M[a;]„) \ supp(M[x']„)| > (d + l)e2. 

Proof. First we consider the case where n > 1. Let y := M2[x]i G {0, ^ y' ■— M2[x']i £ {0, ^ 
where m2 is the number of rows of M2, and let E := supp(y) \ supp(y'). By assumption, \E\ > 62- 
Fix any i £ E so that y{i) = 1 and y'{i) = 0. Therefore, the ith row of M2 must have all zeros 
at positions corresponding to supp(2;') and there is a j G supp(x) \ supp(x') such that M2[i,j] = 1. 
Define S := supp(x) \ {j}, Z := supp(x') \ supp(x), z := M[x]u and / := M[x']u- 

As wgt(x) > wgt(2;'), we know that \Z\ < \S\ + 1. The extreme case \Z\ = l^l + 1 only happens 
when X and x' have disjoint supports, in which case one can remove an arbitrary element of Z to 
ensure that \Z\ < \S\ and the following argument (considering the assumption u > 1) still goes 
through. 

By the definition of regularity, there is a set Ei consisting of at least ei + 1 rows of Mi that 
(n — l)-satisfy the critical set S and the zero set Z. Pick any k G Ei, and observe that z must 
have a 1 at position {k,i). This is because the row of M indexed by {k,i) has a 1 at the jth 
position (since the A;th row of M2 does), and at least u — 1 more I's at positions corresponding to 
supp(x) \ {j} (due to regularity of Mi). On the other hand, note that the kth row of Mi has at 
most u — 1 ones at positions corresponding to supp(2;') (because supp(x') C SU Z), and the ith row 
of M2 has all zeros at those positions (because y'{i) = 0). This means that the row of M indexed 
by (k, i) (which is the bit-wise or of the kih. row of Mi and the ith row of Af2) must have less than 
u ones at positions corresponding to supp(x'), and thus, z' must be at position {k,i). Therefore, 
z and z' differ at position {k,i). 

Since there are at least 62 choices for i, and for each choice of i, at least ei + 1 choices for k, we 
conclude that in at least (ei + l)e2 positions, z has a one while z' has a zero. 

The argument for u = 1 is similar, in which case it suffices to take S := supp(x) and Z := 
supp(x') \ supp(x). □ 

As a corollary it follows that, when Mi is a (d — 1, ei; — l)-regular and M2 is a {d, e2)-disjunct 
matrix, the product M := Mi M2 will distinguish between any two distinct d-sparse vectors 
(of weight at least u) in at least (ei + l)(e2 + 1) positions of the measurement outcomes. This 
combined with Lemma [5] would imply that M is, in particular, ([d/2j,(ei + l)(e2 + 1) — l;n)- 
disjunct. However, using a direct argument similar to the above lemma it is possible to obtain a 
slightly better result, given by Lemma [71 

Lemma 7. Suppose that Mi is a {d,ei;u — l)-regular and M2 is a {2d, 62) -disjunct matrix. Then 
Ml M2 is a (d, (ei + l)(e2 + 1) — 1; u)-disjunct matrix. □ 

As a particular example of where Lemma [6] can be used, we remark that the measurement 
matrices constructed in [9J that are not necessarily disjunct but allow approximation of sparse 
vectors in highly noisy settings of the standard group testing model (as well as those used in 
adaptive two-stage schemes; cf. [8] and the references therein), can be combined with regular 
matrices to offer the same qualities in the threshold model. Li the same way, numerous existing 
results in group testing can be ported to the threshold model by using Lemma El 

3 Constructions 

In this section, we obtain several construction of regular and disjunct matrices. Our first construc- 
tion, described in Construction O is a randomness-efficient probabilistic construction that can be 
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analyzed using standard techniques from the probabihstic method. The bounds obtained by this 
construction are given in Lemma [8] below. The amount of random bits required by this construc- 
tion is polynomially bounded in d and logn, which is significantly smaller than it would be had we 
picked the entries of M fully independently. 

Lemma 8. For every p £ [0, 1) and integer parameter u > 0, Construction\^with m' = Ou{d\og{n/ d) / {1— 
p)"^) (resp., m' = 0„(d^ log(n/(i)/(l— outputs a {d,Q,u{p^')',u) -regular (resp., {d,VLu{pm! /d);u)- 
disjunct) matrix with probability 1 — o(l). 

Proof. We show the claim for regular matrices, the proof for disjunct matrices is similar. Consider 
any particular choice of a critical set S Q [n] and a zero set Z C [n] such that u < \S\ < d and 
\Z\ < \S\. Choose an integer i so that < \S\ < 2^u, and take any j £ [m']. Denote the (i, j)th 

row of M by the random variable w G {0,1}", and by q the "success" probability that w\s has 
weight exactly u and w\z is all zeros. For an integer £ > 0, we will use the shorthand 1^ (resp., 0^) 
for the all-ones (resp., all-zeros) vector of length i. We have 

RC[S] 
\R\=u 

= ^Friiwln) = n • Pr[{w\zuis\R)) = qI^I+I^I-" | {w\n) = H 

R 

^^(1/(2^+2^))" • (1 - Pv[{wU^s\R)) / Ql^l+I^l-" I iw\n) = 11) 

R 

(b) ^ 

> j;(l/(2^+2n))« • (1 - (|5| + \Z\ - n)/(2^+\)) 

R 

> l(lf)(l/(2«u)r>i(M)\(V(2-M).>_L_..^ (3) 

where (a) and (b) use the fact that the entries of w are {u + l)-wise independent, and (b) uses an 
additional union bound. Here the lower bound c > is a constant that only depends on u. Now, 
let e := m'pq. using Chernoff bounds, and independence of the rows, the probability that there 
are at most e rows (among (i, 1), . . . , (i, m')) whose restrictions to S and Z have weights u and 0, 
respectively, becomes upper bounded by 

exp(— (m'g — e)^/(2m'g)) = exp(— (1 — p)'^m'q/2) < exp(— (1 — p)^m'c/2). 

Now take a union bound on all the choices of S and Z to conclude that the probability that the 



• Given: Integer parameters n,m',d,u. 

• Output: An m x n Boolean matrix M, where m := m'\log{d/u)~\. 

• Construction: Let r := [log(d/n)]. Index the rows of M by [r] x [m']. Sample the (i,j)th 
row of M independently from a (u + l)-wise independent distribution on n bit vectors, where 
each individual bit has probability 1/(2*"'"^m) of being 1. 

Construction 2: Probabilistic construction of regular and disjunct matrices. 
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resulting matrix is not {d, e; n)-regular is at most 

(E(;)E("r))-p(-(i-^)'-V2), 

which can be made o(l) by choosing m' = Ou{dlog{n/d)/{l —p)^). 

The proof of the claim for disjunct matrices follows along the same lines, except that we ad- 
ditionally need the vector w to be 1 at the position corresponding to the distinguished column 
i. Under this additional requirement, the lower bound on q would become Vlu{^/d), and this only 
increases the number of rows by a factor Ou{d). □ 

A significant part of this work is a construction of regular matrices using strong lossless con- 
densers. Details of the construction are described in Construction U] that assumes a family of 
lossless condensers with different entropy requirement^, and in turn, uses Construction [3] as a 
building block. The theorem below analyzes the obtained parameters without specifying any par- 
ticular choice for the underlying family of condensers. 



• Given: A strong lossless (A;, e)-condenser /: {0,1}" x {0,1}* — t- {0,1}^, integer parameter 
n > 1 and real parameter p G [0, 1) such that e < (1 — p)/16, 

• Output: An m x n Boolean matrix M, where n := 2" and m = 2*+'^Om(2"(^~'^)). 

• Construction: Let Gi = ({0, 1}^, {0, 1}^, -Ei) be any bipartite bi-regular graph with left 
vertex set {0, 1}^, right vertex set {0, l}'^, edge set Ei, left degree d£ := 8u, and right degree 
dr := 8u2^~^ . Replace each right vertex v of Gi with {^^^ vertices, one for each subset of size u 
of the vertices on the neighborhood of v, and connect them to the vertices in the corresponding 
subsets. Denote the resulting graph by G2 = ({0, 1}^, V2, £"2)1 where IV2I = 2'^('^'') and 
E2 is the edge set of the graph. Define the bipartite graph G3 = ({0, 1}", V3, £^3), where 
V3 := {0, 1}* X V2 is the set of right vertices, as follows: Each left vertex x G {0, 1}" is 
connected to {y,T2{f{x,y)), for each y G {0,1}*, where r2(-) denotes the neighborhood 
function of G2 (i.e., T2{v) denotes the set of vertices adjacent to v in G2). The output matrix 
M is the bipartite adjacency matrix of G3 with columns indexed by the left vertices of row 
indexed by the right vertices of the graph. 



Construction 3: A building block for construction of regular matrices. 



Theorem 9. The m x n matrix M output by Construction^ is {d,pj2^;u) -regular, where 7 = 
max{l, Quid ■ min{2'=W-^'W : z = 0, . . . , r})}. 

Proof. As a first step, we verify the upper bound on the number of measurements m. Each matrix 
Mi has rui = 2*"'"'^*^*)Om(2"(^(*)~''(*))) rows, and Ml has mjrj rows, where = 2*^"*. Therefore, the 
number of rows of M is 

J2um, = ^2*+'°s"'+'-+imi = 2*d J^0„(2"(^»-^'»)). 

i=0 i=0 i=0 

^We have assumed that aU the functions in the family have the same seed length t. If this is not the case, one can 
trivially set t to be the largest seed length in the family. 
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• Given: Integer parameters d > u > 1, real parameter p £ [0,1), and a family /o,---,/r 
of strong lossless condensers, where r := \log{d/u')~\ and u' is the smallest power of two 
such that u' > u. Each fi : {0, 1}" x {0, 1}* — > {0, 1}^(*) is assumed to be a strong lossless 

e)-condenser, where k{i) := log u' + i + 1 and e < (1 — p) /16. 

• Output: An m x n Boolean matrix M, where n := 2" and m = 2*d^[^Q 0„(2"(^(*)~'^(*))). 

• Construction: For each i G {0, . . . ,r}, denote by Mi the output matrix of Construction [3] 
when instantiated with fi as the underlying condenser, and by nii its number of rows. Define 

:= 2^~^ and let M'^ denote the matrix obtained from Mi by repeating each row rj times. 
Construct the output matrix M by stacking Mq, . . . , M^ on top of one another. 



Construction 4: Regular matrices from strong lossless condensers. 

Let S, Z C {0, 1}" respectively denote any choice of a critical set and zero set of size at most d, 
where \Z\ < \S\, and choose an integer i > so that 2^~^u' < \S\ < 2'^u'. Arbitrarily grow the two 
sets S and Z to possibly larger, and disjoint, sets S' ^ S and Z' D Z such that = \Z'\ = 2*n' 
(for simplicity we have assumed that d < n/2). Our goal is to show that there are "many" rows of 
the matrix Mj (in Construction [3]) that n-satisfy S and Z. 

Let k := k{i) = logu' + i + 1, i := i{i), and denote by Gi,G2,G3 the bipartite graphs used 
by the instantiation of Construction [3] that outputs Mj . Thus we need to show that "many" right 
vertices of G3 are each connected to exactly u of the vertices in S and none of those in Z. 

Consider the uniform distribution X on the set S'UZ', which has min-entropy log u' + i + l. By 
an averaging argument, since the condenser fi is strong, for more than a p fraction of the choices 
of the seed y G {0, 1}* (call them good seeds), the distribution Zy := fi{X,y) is e/(l — p)-close (in 
particular, 1/16-close) to a distribution with min-entropy logn' + i + 1. 

Fix any good seed y G {0, 1}*. Let G = ({0, 1}", {0, lY,E) denote a bipartite graph represen- 
tation of fi, where each left vertex x G {0,1}" is connected to fi{x,y) on the right. Denote by 
Ty{S' U Z') the right vertices of G corresponding to the neighborhood of the set of left vertices 
picked by S' U Z' . Note that Ty{S' U Z') = supp(Zy). Using Proposition [2l we see that since Zy is 
1/16-close to having min-entropy log(|5'UZ'|), there are at least (7/8)|5'UZ'| vertices in r(S"UZ') 
that are each connected to exactly one left vertex in 5" U Z'. Since |5| > \S' U Z'\/ A, this implies 
that at least |5'UZ'|/8 vertices in T{S'UZ') (call them F^) are connected to exactly one left vertex 
in S and no other vertex in S' U Z' . In particular we get that \T'y \ > 2^~^. 

Now, in Gi , let Ty be the set of left vertices corresponding to F^ (regarding the left vertices of 
Gi in one-to-one correspondence with the right vertices of G). The number of edges going out of 
Ty in Gi is d^lTj^l > u2^. Therefore, as the number of the right vertices of Gi is 2^, there must 
be at least one right vertex that is connected to at least u vertices in Ty. Moreover, a counting 
argument shows that the number of right vertices connected to u or more vertices in Ty is at least 
2fc-/2fc/(i0u). 

Observe that in construction of G2 from Gi, any right vertex of Gi is replicated C^') times, one 
for each tt-subset of its neighbors. Therefore, for a right vertex of Gi that is connected to at least 
u left vertices in Ty, one or more of its copies in G2 must be connected to exactly u vertex in Ty 
(among the left vertices of G2) and no other vertex (since the right degree of G2 is equal to u). 

Define 7' := max{l, 2^-^2''/(10u)}. From the previous argument we know that, looking at Ty 
as a set of left vertices of G2 , there are at least 7' right vertices on the neighborhood of Ty in G2 
that are connected to exactly u of the vertices in Ty and none of the left vertices outside Ty. Letting 
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Vy be any such vertex, this imphes that the vertex (y, Vy) E V3 on the right part of G3 is connected 
to exactly u of the vertices in S, and none of the vertices in Z. Since the argument holds for every 
good seed y, the number of such vertices is at least the number of good seeds, which is more than 
P7'2*. Since the rows of the matrix rrii are repeated rj = 2^~* times in M, we conclude that M has 
at least p7'2*+'"~* > ^72* rows that n-satisfy S and Z, and the claim follows. 

□ 

3.1 Instantiations 

We now instantiate the result obtained in Theorem [9] by various choices of the family of lossless 
condensers. The crucial factors that influence the number of measurements are the seed length and 
the output length of the condenser. 

Non-constructively, it can be shown that strong (/c, e) lossless condensers with input length n, 
seed length t = logn+log(l/e)+0(l), and output length i = A;+log(l/e)+0(l) exist, and moreover, 
almost matching lower bounds are known [3]. In fact, the optimal parameters can be achieved by a 
random function with overwhelming probability. In this work, we consider two important explicit 
constructions of lossless condensers. Namely, one based on "zig-zag products" due to Capalbo et 
al. ^ and another, coding theoretic, construction due to Guruswami et al. [22] . 

Theorem 10. [4j For every k<h£l^,e>0 there is an explicit lossless {k, e) condenser with 
seed length 0(log^(n/e)) and output length k + log(l/e) + 0(1). 

Theorem 11. ^22j For all constants a £ (0,1) and every k<hGM,e>0 there is an explicit 
strong lossless {k,e) condenser with seed length t = {1 + 1 /a) log (n/c/e) + 0(1) and output length 
l = t+ {l + a)k. 

As a result, we use Theorem [9] with the above condensers to obtain the following. 

Theorem 12. Let u > be fixed, and p € [0, 1) be a real parameter. Then for integer parameters 
d, n S INf where u < d < n, 

1. Using an optimal lossless condenser in Construction^ results in an rui x n matrix Mi that 
is {d,ei]u) -regular, where mi = 0{d{\ogn){\ogd) / {1 — p)""^^) and ei = Q.{pd\ogn), 

2. Using the lossless condenser of Theorem[T^in Construction\^ results in an m2 x n matrix M2 
that is {d, 62; u) -regular, where m2 = 0(T2d(logd)/{l—p)^) for some T2 = exp(0(log^((logn)/(l— 
p)))) = quasipoly(log n), and e2 = 0,{pdT2{l — p)) . 

3. Let (3 > be any fixed constant. Then Construction [7] can be instantiated using the lossless 
condenser of TheoremUJl so that we obtain an x n matrix M3 that is {d,e^]u)-regular, 
where mg = ©(Tg^+^d^+^Oog d)) for T^ := ((logn)(logd)/(l - = poly (log n, log d), 
and 63 = rj(pmax{T3,di-^/"}). 

Proof. First we show the claim for Mi. In this case, we take each /j in Construction |4] to be 
an optimal lossless condenser satisfying the (non-constructive) bounds obtained ir|l [1]. Thus we 
have that 2* = 0(n/e) = 0(log7T./e), and for every z = 0, . . . ,r, we have 2^(*)~'^(*) = 0(l/e), where 
e = 0(1— p). Now we apply Theorem[9]to obtain the desired bounds (and in particular, 7 = VL{ed)). 

Similarly, for the construction of M2 we set up each /j with the explicit construction of con- 
densers in Theorem [10] for min-entropy k{i). In this case, the maximum required seed length 

*This result is similar in spirit to the probabilistic argument used in |31) for showing the existence of good 
extractors. 
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is t = 0{log^{h/e)), and we let T2 := 2* = exp(0(log^((logn)/(l — p))))- Moreover, for every 
i = 0, . . . ,r, we have 2^(*^~'^(*) = 0(l/e). Plugging these parameters in Theorem [9] gives 7 = Q{ed) 
and the bounds on 771-2 and 62 follow. 

Finally, for M3 we use Theorem [11] with a := f3/u. Thus the maximum seed length becomes 
t = (1 + n//3) log(n(log d)/{l — p)) + 0(1), and for every i = 0, . . . , r, we have — k{i) = 
0{t + f3{logd)/u). Clearly, T3 = 0(2*), and thus (using Theorem [9|) the number of measurements 
becomes ms = T'^+^d^+f^ {log d). Moreover, we get 7 = max{l, n((i^-'^/"/r)}, which gives 63 = 
Vl{pT^) = pmax{T, d"'^^'^/"}, as claimed. □ 

By combining this result with Lemma [7| using any explicit construction of classical disjunct 
matrices, we obtain (d, e; n)-disjunct matrices that can be used in the threshold model with any 
fixed threshold, sparsity d, and error tolerance [e/2j. In particular, using the coding-theoretic 
explicit construction of nearly optimal classical disjunct matrices from codes on the Gilbert- 
Varshamov bound [30] (Theorem [19] in the appendix), we obtain (d, e; u)-disjunct matrices with 
m = 0(m'(i^ (log n)/(l — p)^) rows and error tolerance e = VL{e'pd{\ogn)/{\ — p)), where m' and 
e' are respectively the number of rows and error tolerance of any of the regular matrices obtained 
in Theorem [121 We note that in all cases, the final dependence on the sparsity parameter d is, 
roughly, 0{d^) which has an exponent independent of the threshold u. Rows M7-M9 of Table [T] 
summarize the obtained parameters for the general case (with arbitrary gaps). We see that, when 
d is not negligibly small (e.g., d = n^^^'^), the bounds obtained by our explicit constructions are 
significantly better than those offered by strongly disjunct matrices. 

4 The case with positive gaps 

In preceding sections we have on the case where g = 0. However, in this section we observe that 
all the techniques that we have developed in this work can be extended to the positive-gap case 
in a straightforward way. The main observations are listed below. Recall from [I4| that in the 
positive-gap case, we can only hope to distinguish between distinct d-sparse vectors x and x' where 
at least one has support size u or more and either |supp(x) \supp(x')| > g 01 |supp(x') \supp(x)| > g. 
We will call any pair of such vectors distinguishable. 

Observation 1 

For the positive-gap case. Definition [1] of threshold disjunct matrices can be adapted to allow more 
than one distinguished column in disjunct matrices. In particular, in general we may require the 
matrix M to have more than e rows that n-satisfy every choice of a critical set S, a zero set Z, and 
any set oi g + 1 designated columns I Q S (at which all entries of the corresponding rows must be 
1). Denote this generalized notion by (d, e; u, g()-disjunct matrices. It is straightforward to extend 
the arguments of Lemma [5] to show that the generalized notion of {d,e;u, g)-disiunct matrices is 
necessary and sufficient to capture non-adaptive threshold group testing with upper threshold u 
and gap g. More precisely, the generalized definitions of threshold disjunct and regular matrices 
are as follows. 

Definition 13 (Definition [H generalized). Let n,d,e,u,g be non-negative integers where g <u< 
d < n. A Boolean matrix M with n columns is called {d, e; u, 5()-disjunct if for every subset of 
columns S C [n] (called the critical set), every Z C [n] (called the zero set) such that u < \S\ < d, 
\Z\ < \S\, 5 n Z = 0, and every set / C S" of g + 1 distinguished columns (|/| = g + 1), there are 
more than e rows of M that u-satisfy S and Z and moreover, M\j has all ones at those columns. 
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Moreover, M is called {d, e; u, 5()-regular if for every choice of the critical and zero sets S, Z CI [n] 
with \Z\ < \S\ + g, there is a set of more than e rows of M that (u — 5')-satisfy S and Z. 

Note the slight difference between the notion of regular matrices above compared to Definition^ 
namely, that the zero set Z can now be slightly larger than the critical set S (by at most u) , and that 
the matrix is now required to {u — g)-satisiy (as opposed to li-satisfy) every choice of S and Z. The 
two notions coincide for g = 0. In general, the difference between the two notions of regular matrices 
is negligible as long as the parameter g remains small. In particular, it is straightforward to verify 
that all our results about the construction of regular matrices in the gap-free case (Constructions 
[2] and H]) as well as the obtained bounds (Lemma [U Theorem [9] and Theorem [T2]) hold for the 
generalized notion of regular matrices with only a slight effect on the hidden terms that only depend 
on the threshold parameter u. We will see, however, that the generalized notion of threshold- 
disjunct matrices is stronger than Definition |4] and the extra requirements may substantially affect 
the bounds (but not the construction techniques). 

Below we show that the generalized notion of threshold disjunct matrices precisely captures the 
combinatorial structure needed for threshold group testing with arbitrary gap. 

Lemma 14 (LemmaO generalized). Let M be anmxn Boolean matrix that is {d, e; u, g)-disjunct, 
and define £ := u — g. Then for every distinguishable d-sparse vectors x,x' £ {0, 1}", each having 
support size u or more and such tha^ |supp(x) \ supp(x')| > g and wgt(a;) > |supp(x') \ supp(x)|, 
the following holds. Let y S M[x]£ „ and y' £ M[x']i^u- Then, 

|supp(?/) \supp(y')l > e. (4) 

Conversely, if M satisfies ([4]) for every choice of x, x' , y, y' as above, it must be ([d/2j , e; u, (^)- 
disjunct (assuming n > d + g). 

Proof. First, suppose that M is (d, e; u, (7)-disjunct, and let y := M[x\u and y' := M[x']u. Take any 
/ C supp(x) \ supp(x') of size g + 1, and let S := supp(x) and Z := supp(x') \ supp(x). Note that 
IS"! < d and by assumption, we have \Z\ < \S\. Now, Definition [13] implies that there is a set E of 
more than e rows of M that u-satisfy I as the set of distinguished columns, S as the critical set 
and Z as the zero set. Thus for every j G E, the jth row of M restricted to the columns chosen by 
supp(x) must have weight exactly u, while its weight on supp(x') is less than u — g = i. Therefore, 
y{j) = 1 and y'{j) = for more than e choices of j. 

For the converse, consider any choice of a set of distinguished columns / C [n] with |I| = g + 1, 
a critical set S C [n] containing / (such that l^l > u), and a zero set Z C [n] where \Z\ < \S\. 
Define d-sparse Boolean vectors x,x' £ {0, 1}" so that supp(j;) := 5 and supp(x') := Z U {S \ I). 
We note that |supp(j;)| > u and also, without loss of generality, |supp(x')| > u (if the latter is not 
the case, we can simply enlarge Z by arbitrarily adding up to 5 -|- 1 elements outside the support 
of S to it and observe that is suffices to show the claim for the larger Z). 

Let y S M[a;]^^„ and y' £ M[x']e^u and E := supp(?/) \ supp(y'). By assumption we know that 
\E\ > e. Take any j G E. Since y{j) = 1 and y'{j) = 0, we get that the jth row of M restricted to 
the columns picked by Z U (5 \ /) must have weight at most i — l = u — (g + l), whereas it must 
have weight at least u when restricted to S. As the sets L,S\L, and Z are disjoint and |/| = g + 1, 
this can hold only if the jth row of M restricted to the columns picked by S, Z, and / has weights 
exactly u, 0, and g + 1, respectively. Hence, this row (as well as all the rows of M picked by E) 
must ii-satisfy I, S, and Z, confirming that M is ([d/2j , e; M)-disjunct. □ 

^ As in Lemma [5] one can verify that at least one of the two possible orderings of any pair of distinguishable 
vectors satisfies this condition. 
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Note that, for technical reasons, the above lemma assumes both vectors x and x' to have 
support size u or more. This is no loss of generality since, if |supp(x)| > u while |supp(x')| < u, 
we have Mi^u\x'] = and one can trivially use the disjunctness property of M and show that 
|supp(M£,„[x])| > e. 

Observation 2 

The following proposition directly follows from the definitions, and relates strongly disjunct matrices 
to generalized threshold disjunct matrices. 

Proposition 15. Let n,d,e,u,g be non-negative integers where g < u < d < n — (d + g + 1). 
Suppose that M and M' are binary m x n matrices, where M is {d, e;u, g)- disjunct and M' is 
strongly (2d, e; u)-disjunct. Then, M is strongly (d, e;g + l)-disjunct and M' is {d, e; u, g) -disjunct. 

Proof. First, we verify the conditions of Definition [3] for M. Consider any pair of disjoint sets 
I,Z C [n] where |/| = g + 1 and \Z\ < d. Let S C [n] be any set of size d containing / and 
disjoint from Z. Note that \Z\ < \S\. From Definition [13] (with the critical set S, zero set Z, and 
distinguished set /), there is a set of more than e rows of M at which M\z all zeros and M\j is 
all ones. In other words, denoting the ith column of M by Cj, we have that 

I Rig/ supp(Ci) \ Ui6zsupp(Ci)| > e, 

as required by Definition [3l 

Now consider the matrix M' and any choice of a I, S, Z as in Definition [T3l Let J C 5* be any 
subset of S of size u that contains /, and S' := Z L) (S \ J). Note that \S'\ < 15*1 + |Z| < 2d. Now 
from Definition [3] of strongly disjunct matrices, we know that 

I Hiej supp(Ci) \ Uie5'Supp(Ci)| > e. 

In other words, there is a set of more than e rows of M' at which M'\i is all ones, M'\s has weight 
exactly u, and M'\z is all zeros, as required by Definition 1131 □ 

The special case u = g + 1 in the above proposition is particularly interesting. A chain of 
reductions between strongly disjunct and threshold disjunct matrices in this case implied by the 
above result is schematically shown below. 

{2d, e;g + 1, fif)-disjunct 

I 

strongly (2d, e;g + l)-disjunct 

I 

{d, e;g + 1, 5)-disjunct 

I 

strongly (d, e;g + l)-disjunct 

Therefore, when the upper threshold u is more than the gap parameter g by one (equivalently, 
when the lower threshold i is one), the two notions of threshold disjunct matrices and strongly 
disjunct matrices become equivalent up to a multiplicative factor in the sparsity parameter d. As 
discussed in Section [21 almost matching lower bounds and upper bounds on the number of rows 
m achievable by a strongly (d, e;g-\- l)-disjunct matrix are known. Asymptotically, the number of 
rows must always satisfy m = il.{d^~^'^ log^n + ed^'^^) and moreover, a probabilistic construction 
achieves m = 0{d^~^'^ log{n/d) and e = Q,{dlog{n/d)) with probability 1 — o(l) (see Table [T]). As 
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a result, the upper and lower bounds on the number of rows of strongly disjunct and threshold 
disjunct matrices become equivalent up multiplicative constants when the lower threshold is one. 

Proposition [15] asserts that the notion of strongly disjunct matrices is in general stronger than 
threshold disjunct matrices. As we will see below, the former becomes strictly stronger when i > 1. 
As the lower threshold i becomes larger, the discrepancy between the number of rows achievable by 
threshold disjunct matrices and strongly disjunct matrices becomes more significant (see Tabled]). 

Observation 3 

As pointed out after Definition [T3l the generalized definition of regular matrices may affect the 
bounds obtained by our probabilistic and explicit constructions (Constructions [2] and [3]) only by 
hidden factors depending on u (essentially without any change in the proofs). For the case of 
generalized disjunct matrices, however, the bounds may substantially change depending on the gap 
parameter g. 

Below we generalize Lemma [8] for the case of threshold-disjunct matrices and show that Con- 
struction [2] results in a {d, Qu{pdlog{n/d)/{l—p)'^);u,g)-disinnct matrix (with probability 1 — o(l)) 
if the number of measurements is increased by a factor 0{d^). More precisely, we can now show 
the following lemma. 

Lemma 16 (Lemma [S] generalized). For every p G [0,1) and integer parameters u > g > 0, 
Construction with m' = Ou{d^~^'^log{n/d)/{l — p)^) outputs a {d^VLu{pm' /d^'^'^);u,g)-disjunct 
matrix with probability 1 — o(l). 

Proof. The proof essentially follows along the same lines as the proof of Lemma [8] The difference, 
compared to the case g = covered by Lemma [8l is that we have a set / of distinguished columns 
/ C [n] in Definition [13] where |/| = g + 1 and the random vector w in the proof of Lemma [8] 
must have ones at all positions picked by /. With this requirement, the lower bound on the success 
probability g in ([3]) becomes c = i~lu{^/d^~^^)- The rest of the proof remains unchanged except 
for the new lower bound on c, which makes the error tolerance parameter e in the proof lower 
bounded by ^l,{pm' /d3~^^), while increasing the parameter m' to a quantity upper bounded by 
Ou{da+^log{n/d)/{l-pf). □ 

Observation 4 

Lemma [7] can be extended to positive gaps as follows. 

Lemma 17 (Lemma [T] generalized). Suppose that Mi is a {d,ei;u,g + \)-regular and M2 is a 
strongly (2d,e2',g + l)-disjunct matrix. Then Mi M2 is a {d, (ei + l)(e2 -|- 1) — I] u^g)- disjunct 
matrix where u := i + g. 

Proof. Let M := Mi M2. Towards verifying that M satisfies the requirements of Definition 113^ 
consider a set / C [n] of distinguished columns of M, where n is the number of columns of the 
matrices and |/| = g + 1, in addition to critical and zero sets S,Z CI [n] as in Definition [13] satisfying 
\Z\ < \S\. Index the rows of M naturally by the elements of [mi] x [m2], where mi and m2 are the 
number of rows of Mi and M2, respectively, and the (i,j)th row of M is the bitwise disjunction of 
the ith. row of Mi and the j'th row of M2. 

Let S' := S\I and Z' := Z U {S \ I). Observe that |Z| < \S\ < \S'\ + 5 + 1 = |5| + |/| and 
\Z'\ < 2d. From Definition 1131 there is a set Ei C [mi] of size more than ei such that Mi\s' has 
weight exactly i — 1 = u — \I\, and M2\z is all zeros. Moreover, there is a set E2 C [m2] of size 
more than 62 at which M2|i' has all ones and Af2|^' has all zeros. This means that, at all rows 
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corresponding to Ei x E2, the product matrix Af has weight exactly £ — 1 + |/| = n at positions 
corresponding to S and all zeros at positions corresponding to Z. Therefore, M indeed u-satisfies 
any choice of the sets /, S, Z at more than (ei + l)(e2 + 1) — 1 rows. □ 

Consequently, using the coding-theoretic construction of strongly disjunct matrices described 
in Appendix El our explicit constructions of (d, e; ii)-disjunct matrices obtained in Section [3] can 
be extended to the positive gap model at the cost of a factor 0{d^) increase in the number of 
measurements. The results from combining the above lemma with various constructions of regular 
and strongly disjunct matrices are summarized in Table [H 

5 Concluding remarks 

In this work we have introduced the combinatorial notion of regular binary matrices, that is used 
as an intermediate tool towards obtaining threshold testing designs. 

Even though our construction, assuming an optimal lossless condenser, matches the probabilis- 
tic upper bound for regular matrices, the number of measurements in the resulting threshold testing 
scheme (obtained from the simple direct product in Construction [T]) becomes larger than the prob- 
abilistic upper bound by a factor of 0((ilogn). Thus, an outstanding question is coming up with a 
direct construction of disjunct matrices that matches the probabilistic upper bound. Despite this, 
the notion of regular matrices may be of independent interest, and an interesting question is to 
obtain (nontrivial) concrete lower bounds on the number of rows of such matrices in terms of the 
parameters n,d,e,u (and the gap parameter g in the generalized definition of Section S]). 

Moreover, in this work we have assumed the upper threshold n to be a fixed constant, allowing 
the constants hidden in asymptotic notions to have a poor dependence on u. An outstanding 
question is whether the number of measurements can be reasonably controlled when the upper 
threshold u and possibly the gap parameter g become large; e.g., g,u = Q,{d). 

Another interesting problem is decoding. While our constructions can combinatorially guarantee 
identification of sparse vectors, for applications it is important to have an efficient reconstruction 
algorithm as well. Contrary to the case of strongly disjunct matrices that allow a straightforward 
decoding procedure (cf. [6]), it is not clear whether in general our notion of disjunct matrices allow 
efficient decoding, and thus it becomes important to look for constructions that are equipped with 
efficient reconstruction algorithms. 

Finally, for clarity of the exposition, in this work we have only focused on asymptotic trade-offs, 
and it would be nice to obtain good, finite length, estimates on the obtained bounds that are useful 
for applications. 
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A Strongly disjunct matrices from codes 

In this appendix we describe a construction of strongly disjunct matrices (as in Definition [2D which 
is a straightforward extension of the classical result of Kautz and Singleton |24j for construction of 
combinatorial designs. Construction [5] explains the idea, which is analyzed in Lemma [TSl below. In 
this section we use standard tools from the theory of error-correcting codes. The interested reader 
is referred the standard texts in coding theory (e.g., the books by Mac Williams and Sloane |28j . 
van Lint [26j, and Roth [32]) for background. 



• Given: An {'h,k,d)q error-correcting code"*^*^ C C [g]", and integer parameter u > 0. 

• Output: An m x n Boolean matrix M, where n = , and m = nq^. 

• Construction: First, consider the mapping 93: [9] — )• {0,1}'^" from g-ary symbols to column 
vectors of length defined as follows. Index the coordinates of the output vector by the 
It-tuples from the set [g]". Then ip{x) has a 1 at position (ai, . . . ,0^) if and only if there is 
an i G [u] such that Ui = x. Arrange all codewords of C as columns of an n x q^ matrix M' 
with entries from [q]. Then replace each entry x of M' with (p{x) to obtain the output mx n 
matrix M. 



Construction 5: Extension of Kautz-Singleton's method [24j . 

Lemma 18. Construction\^ outputs a strongly {d,e;u)- disjunct matrix for every d < (n — e)/((n — 
d)u). 

Proof. Let C := {ci,...,c„} C [n] and C := {c'^, . . . , c'^} C [n] be disjoint subsets of column 
indices. We wish to show that, for more than e rows of M, the entries at positions picked by C are 
all-ones while those picked by C are all-zeros. For each j G [n], denote the j'th column of M' by 
M'(j), and let M'{C) := {M'{cj): j G [u]}, and M'(C") := {M'(c^.): J ^ Ml- 

From the minimum distance of C, we know that every two distinct columns of M' agree in at 
most h — d positions. By a union bound, for each i G [d], the number of positions where M'(c^) 
agrees with one or more of the codewords in M'(C) is at most u{h — d), and the number of positions 
where some vector in M'[C') agrees with one or more of those in M'{C) is at most du{n — d). 

By assumption, we have n — du{n — d) > e, and thus, for a set £J C [h] of size greater than e, at 
positions picked by E none of the codewords in M'(C') agree with any of the codewords in M'(C). 

^°We use the notation {n, k, d)q code for a g-ary code of length n, size q'', and minimum distance at least d. 
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Now let w E [g]" be any of the rows of M' picked by E, and consider the x n Boolean matrix 
W formed by applying the mapping ip{-) on each entry of w. We know that {w(cj): j G [u]} n 
{w{c'j) : j € [d]} = 0. Thus we observe that the particular row of W indexed by (w(ci), . . . , w{cu)) 
(and in fact, any of its permutations) must have all-ones at positions picked by C and all-zeros 
at those picked by C . As any such row is a distinct row of M, it follows that M is strongly 
(d, e; ti)-disjunct. □ 

Here we mention a few specific instantiations of the above construction. Namely, we will first 
consider the family of Reed-Solomon codes, that are also used in the original work of Kautz and 
Singleton [24], and then move on to the family of algebraic geometric (AG) codes on the the 
Tsfasman-Vladut-Zink (TVZ) bound, Hermitian codes, and finally, codes on the Gilbert- Varshamov 
(GV) bound. 

Reed-Solomon codes: Let p G [0, 1) be an arbitrary "noise" parameter. If we take C to be 
an [n, k, d]n Reed-Solomon code over an alphabet of size h (which we assume to be a prime power), 
where d = n — k + 1, we get a strongly disjunct (d, e; ti)-matrix with m = 0{dulogn/{l — p))^~^^ 
rows and e = ph = n{pdu{log n) / {1 —p))- 

AG codes on the TVZ bound: Another interesting family for the code C is the family of 
algebraic geometric codes that attain the Tsfasman-Vladu^-Zink bound (cf. |21ll36j ). This family 
is defined over any alphabet size g > 49 that is a square prime power, and achieves a minimum 
distance d > h — k — h/ (y^ — 1). Let e := pn, for a noise parameter p G [0, 1). By Lemma [TSl the 
underlying code C needs to have minimum distance at least n(l — (1 — p)/{du)). Thus in order to 
be able to use the above-mentioned family of AG codes, we need to have q S> {du/{l — p))"^ ='■ Qo- 
Let us take an appropriate q S [2qo,8qo], and following Lemma [T8| h — d= [n(l —p)/{du)~\. Thus, 
the dimension of C becomes at least 

k>n-d- = n ( ''^\-P^ ) =n{n/^), 



— 1 \ du 

and subsequentl30 we get that logn = klogq > k = ^{n/^Jqo). Now, noting that m = g"n, we 
conclude that 

/ du \ ^"^^ 

m = q^fi = Oiq^^^^"^ log n) = O { ) log n, 

\l-pj 

and e = Q-{pdu{\ogn) / {1 —p))- 

We see that the dependence of the number of measurements on the sparsity parameter d is worse 
for AG codes than Reed-Solomon codes by a factor d", but the construction from AG codes benefits 
from a linear dependence on logn, compared to log''^^ n for Reed-Solomon codes. Thus, AG codes 
become more favorable only when the sparsity is substantially low; namely, when d <C logn. 

Hermitian codes. A particularly nice family of AG codes arises from the Hermitian function 
field. Let q' be a prime power and q := q'^. Then the Hermitian function field over Wq is a finite 
extension of the rational function field Fq(x), denoted by Fg(x,y), where we have y'^ + y = x'^'^^. 
The structure of this function field is relatively well understood and the family of Goppa codes 
defined over the rational points of the Hermitian function field is known as Hermitian codes. This 
family is recently used by Ben-Aroya and Ta-Shma [1] for construction of small-bias sets. Below 
we quote some parameters of Hermitian codes from their work. 

The number of rational points of the Hermitian function field is equal to g''^ + 1, which includes 
a common pole Qoo of x and y. The genus of the function field is ^ = {(^ — 1) /2. For some integer 



^^Note that, given the parameters p,d,n, the choice of q depends on p,d, as explained above, and then one can 
choose the code length n to be the smallest integer for which we have q'' > n. But for the sake of clarity we have 
assumed that = n, which does not affect the asymptotic bounds. 
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parameter r, we take G := rQ^o as the divisor defining the Riemann-Roch space C{G) of the code 
C, and the set of rational points except Qoo as the evaluation points of the code. Thus the length 
of C becomes n = q'^ . Moreover, the minimum distance of the code is d = n — deg(G) = n — r. 
When r > 2g — 1, the dimension of the code is given by the Riemann-Roch theorem, which is 
equal to r — g + 1. For the low-degree regime where r < 2g — 1, the dimension k of the code is 
the size of the Wirestrauss semigroup of G, which turns out to be the set W = € IN^ : j < 

q' - lA iq' +j{q' + 1) < r}. 

Now, given parameters d,p of the disjunct matrix, define p := {1 — p)/{{d + ^)u), take the 
alphabet size g as a square prime power, and set r := pc^l"^ . First we consider the case where 
r <1g — \ = 2q — 2^Jq — 1. In this case, the dimension of the Hermitian code becomes k = \W\ = 
il{r'^/q) = il(/9^(jf^). The distance d of the code satisfies d = h — r > n(l — p) and thus, for e := ph, 
conditions of Lemma [TSl are satisfied. The number of the rows of the resulting measurement matrix 
becomes ni = g"+3/2^ g^^id we have n = q^. Therefore, 

logn = klogq > k = Q.{p^q^) =^ q = 0(\/logn/p) =^ m = O ( ( ^"^^^^ ^ \ m+3/2 \ ^ 

V l-p ' J 

and in order to ensure that r < 2^ — 1, we need to have du/{l — p) ^ \/logn. On the other hand, 
when du/{l — p) <C \^log n, we are in the high-degree regime, in which case the dimension of the 
code becomes k = r — g + l = ^{r) = ^{pq^^^), and we will thus have 



q = 0{{logn/pfh^m = o[{^Y^'-") 



Altogether, we conclude that Construction [5] with Hermitian codes results in a strongly (d, e; u) 
disjunct matrix with 

, / d^/log n .dlogn.2/3.u+3/2 
m = U { { — h * * 



1 — p 1 — p ' ' 

rows, where e = p ■ (d(logn)/(l — p) + {d^/^ogn/{l — p))^^'^) ■ Compared to the Reed-Solomon 
codes, the number of measurements has a slightly worse dependence on d, but a much better 
dependence on n. Compared to AG codes on the TVZ bound, the dependence on d is better while 
the dependence on n is inferior. 

Codes on the GV bound. A g-ary (n, k, d)-code (of sufficiently large length) is said to be on 
the Gilbert- Varshamov bound if it satisfies k > n(l — hq{d/h)), where hq{-) is the g-ary entropy 
function defined as 

hg(x) := xlogg(g - 1) - xlogg(x) - (1 - x)logg(l - x). 

It is well known that a random linear code achieves the bound with overwhelming probability (cf. 
[28]). Now we apply Lemma[T8]on a code on the GV bound, and calculate the resulting parameters. 
Let p := (1 — p)/{4:du), choose any alphabet size q G [l/p,2/p], and let C be any g-ary code of 
length n on the GV bound, with minimum distance d > h{l — 2/q). By the Taylor expansion of 
the function hq{x) around x = 1 — 1/q, we see that the dimension of C asymptotically behaves as 
k = @{h/{qlogq)). Thus, the number of columns of the resulting measurement matrix becomes 
n = q^ = 2^("/''). Moreover, the number m of its rows becomes 

m = = 0(g"+i log n) = 0{{d/{l - p))"+^ log n), 

and the matrix becomes strongly (d, e; ti)-disjunct for e = ph = VL{pd{\ogn) / {1 —p)). 
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We remark that for the range of parameters that we are interested in, Porat and Rothschild 
[30] have come up with a deterministic construction of linear codes on the GV bound that runs in 
time po\y{q^) (and thus, polynomial in the size of the resulting measurement matrix). 

Their construction is based on a derandomization of the probabilistic argument for random 
linear codes using the method of conditional expectations, and as such, can be considered weakly 
explicit (in the sense that, the entire measurement matrix can be computed in polynomial time in 
its length; whereas for a fully explicit construction one must ideally be able to deterministically 
compute any single entry of the measurement matrix in time poly (d, log n), which is not the case 
for this construction). Altogether, we obtain the following result. 

Theorem 19. There is an algorithm that, given integer parameters d < n and u > and real 
parameter p G [0,1) outputs an m x n binary matrix which is strongly {d,e;u)- disjunct. The 
parameters m and e satisfy the bounds m = 0{{d/{l — p))""^^logn and e = Q.{pd{\ogn) / {1 — p)). 
Moreover, the running time of the algorithm is polynomial in mn. □ 

Using a standard probabilistic argument it is easy to see that a random m x n matrix, where 
each entry is an independent Bernoulli random variable with probability 1/d of being 1, is with 
overwhelming probability strongly (d, e; M)-disjunct for e = ^{pd\og{n/d)/{l — p)'^) and m = 
0{d'^^^(log{n/d))/{l — p)^) (the proof is very similar to the proof of Lemma [8|). Thus we see 
that, for a fixed p, Construction [5] when using codes on the GV bound almost matches these pa- 
rameters. Moreover, the explicit construction based on Reed-Solomon codes possesses the "right" 
dependence on the sparsity d, while AG codes on the TVZ bound have a matching dependence on 
the vector length n with random measurement matrices, and finally, the trade-off offered by the 
construction based on Hermitian codes lies in between the one for Reed-Solomon codes and AG 
codes. 
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